[Cadence: Vision] ResNet18 & ResNet50: Optimized, DMA-enabled, functional#19111
[Cadence: Vision] ResNet18 & ResNet50: Optimized, DMA-enabled, functional#19111cad-rlc wants to merge 9 commits into
Conversation
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/19111
Note: Links to docs will display an error until the docs builds have been completed.
|
|
The following ciflow label(s) have been added but CI has not been triggered yet because the workflows are awaiting approval:
Once a maintainer approves the workflows (scroll to the bottom of the PR page), the corresponding CI jobs will be triggered automatically. Please ping one of the reviewers if you do not have access to approve and run workflows. |
This PR needs a
|
|
@mcremon-meta @hsharma35 @zonglinpengmeta |
mcremon-meta
left a comment
There was a problem hiding this comment.
Will continue the review later, but can we clean the set of files first? I don't quite understand why we have so many files checked in, including CMakeFiles etc.
| @@ -0,0 +1,25 @@ | |||
| Collecting matplotlib | |||
There was a problem hiding this comment.
not sure what this file is?
| kernel_name: impl::generic::quantized_matmul_asym8uxasym8u_asym8u_out | ||
|
|
||
| - func: cadence::im2row.out(Tensor input, int[2] kernel_size, int[2] dilation, int[2] padding, int[2] stride, Tensor in_zero_point, bool channel_last=False, *, Tensor(a!) out) -> Tensor(a!) | ||
| - func: cadence::im2row.out(Tensor input, int[2] kernel_size, int[2] dilation, int[2] padding, int[2] stride, Tensor in_zero_point, bool channel_last, *, Tensor(a!) out) -> Tensor(a!) |
|
@mcremon-meta few stale files were accidentally committed in this pull request. We are addressing the issue and will submit a new PR shortly. |
|
…onal - Add DMA-optimized operators: conv2d (1x1/3x3/7x7), maxpool, quantize/dequantize, relu, add, mean, softmax, linear - Add new operators: embedding, full, im2row, quantized_fully_connected, quantized_layer_norm, quantized_matmul, requantize, view_copy - Add vision/kernels library and quantized_ops.h header - Add config generator for DMA buffer sizing - Update functions_vision.yaml and CMakeLists.txt - Add third-party XAI libraries (libxai, libxai_common, libxa_nnlib) - FACTO submodule update
|
@mcremon-meta |
Summary
Optimized Cadence Vision 130 DSP operators for ResNet-18 and ResNet-50 inference. All operators are DMA-enabled with ping-pong tiling based on available DRAM size, and fall back to cache mode when no DRAM or insufficient DRAM is available for a given kernel's usage. All operators are functionally verified (int8 quantized, NCHW layout).
Operators
Conv2d (
quantized_conv2d_nchw)MaxPool2d (
maxpool_exec_mxnj2)Mean / AdaptiveAvgPool (
mean_exec_dma)Quantize / Dequantize (
quantize_per_tensor,dequantize_per_tensor)Quantized ReLU (
quantized_relu)Quantized Linear (
quantized_linear_out)Add (
op_add)Softmax (
op_softmax)Build Configuration
Test Configuration
Vision 130 DSP core configuration used for testing:
Performance Results
We observed approximately 45× and 55× performance improvements for complete inference of ResNet-18 and ResNet-50, respectively, with optimized operators compared to generic operators, when using memory modeling: --mem_model --mlatency=40 --blockrepeat=1 --write_delay=40 --write_repeat=1 and 64K DRAM0 and DRAM1.
cc @mcremon-meta
@hsharma35
@zonglinpengmeta